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Abstract 

(N , 

It is a common problem in signal processing to remove a non-ideal detectors resolution 
from a measured probability density function of some physical quantity. This process is 
called unfolding (a special case is the deconvolution) , and it would involve the inversion of 
the integral operator describing the folding (i.e. the smearing of the detector). Currently, 
t— I . there is no unbiased method known in literature for this issue (here, by unbiased we mean 

^ | those approaches, which do not assume an ansatz for the unknown probability density 

■ function). 

There is a well-known series expansion (Neumann series) in functional analysis for 
perturbative inversion of specific operators on Banach spaces. However, operators that 
\q ' appear in signal processing (e.g. folding and convolution of probability density functions), 

in general, do not satisfy the usual convergence condition of that series expansion. This 
article provides some theorems on the convergence criteria of a similar series expansion for 
this more general case, which is not covered yet by the literature. 

The main result is that a series expansion provides a robust unbiased unfolding and 
deconvolution method. For the case of the deconvolution, such a series expansion can 
always be applied, and the method always recovers the maximum possible information 
about the initial probability density function, thus the method is optimal in this sense. A 
very significant advantage of the presented method is that one does not have to introduce 
ad hoc frequency regulations etc., as in the case of usual naive deconvolution methods. For 
the case of general unfolding problems, we present a computer-testable sufficient condition 
for the convergence of the series expansion in question. 

Some test examples and physics applications are also given. The most important 
physics example shall be (which originally motivated our survey on this topic) the case of 
7T° -^7 + 7 particle decay: we show that one can recover the initial tt° momentum density 
function form the measured single 7 momentum density function by our series expansion. 



1 Introduction 

In experimental physics, one commonly faces the following problem. The probability density 
function of a given physical quantity is to be measured (e.g. by histograming) with an experi- 
mental apparatus, but a non-ideal detector smears the signal. The question arises: if one knows 
the behavior of the detector quite well (i.e. one knows the response function of the detector), 
how can one reconstruct the original undistorted probability density function of the given phys- 
ical quantity. Specially: there is an unknown probability density function x 1— > f(x) (this is the 
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unknown probability density function of the undistorted physical quantity), and the measured 
density function is obtained by y h- > g(y) = f p(y\x)f(x) dx (where the conditional density 
function (y,x) i— > p(y\x) describes the smearing of the measurement apparatus, also called as 
response function), then under which conditions and how can one re-obtain (i.e. unfold) the 
original probability density function / by measuring g and by knowing p. We formalize this 
problem below. (In the text we shall abbreviate probability density function by pdf, conditional 
probability density function by cpdf, and the notion Lebesgue almost everywhere or Lebesgue 
almost every, known in measure theory, by ae.) 

Let X and Y be two finite dimensional real vector spaces, each equipped with the Lebesgue 
measure (which is unique up to a global positive constant factor). Then L 1 (X) and L 1 (Y) 
denote the space of Lebesgue integrable function classes X — > C and Y — ■> C, respectively. 

Definition 1. Let p : Y x X — > IRq , (y,x) i— > p(y\x) is a cpdf over the product space Y x X, 
(i.e. it is a nonnegative valued Lebesgue measurable function on the product space which satisfies 
for all x G X : j p(y\x) dy = 1). Then the linear operator 

A p : L\X) - L\Y), (x h- /(a;)) » (v " j P(y\^)f(x) dx) , 
is called the folding operator by p. 



Remark 2. The remarks below are trivial. 



1. By Fubini's theorem, this linear operator is well defined. 

2. By the monotonicity of integration, such an operator is continuous: 
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p(y\x)f(x) dx 



dy< J J p(y\x)\f(x)\ dx dy 



LHX) 



It is also trivial that we can saturate the above inequality by taking ae nonnegative 
function f, thus \\A p \\ = 1 aiso follows. 



Our main interest will be the question: when is the operator A p invertible, and how the 
inverse operator could be evaluated on given pdfs in a constructive way. 



1.1 A special case: deconvolution problem 

A special case of the unfolding problem is the so called deconvolution, i.e. when Y = X and the 
cpdf p is translation invariant in the sense that for all a G X and for all y, x G X : p(y\x + a) = 
p(y — a\x). In this case, the cpdf p can be expressed by a pdf rj in the way p(y\x) = rj(y — x) 
for all x, y G X. 

Definition 3. Let 7] be a pdf (i.e. it is a nonnegative valued Lebesgue integrable function on X 
such that j rj(x) dx = I). Then the linear operator 

A v -. l\x)^l\x), f»ri*f=(y» J v(y-x)f(x)dx^j . 

is called the convolution operator by r]. 
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We will state here a few properties of a convolution operator (see e.g. [T], [2]). 

1. A convolution operator is not onto, and its image is not closed. 

2. The range of a convolution operator is dense if and only if the Fourier transform of the 
convolver function is nowhere zero (Wiener's approximation theorem). 

3. A convolution operator is one-to-one if and only if the set of zeros of the Fourier transform 
of the convolver function has zero Lebesgue measure. 

Remark 4. As a consequence, the inverse of a convolution operator - if it exists at all - is 
not continuous. Indeed, the convolution operator is everywhere defined and continuous, so it 
is closed, thus its inverse is closed as well; since the domain of the inverse is not closed, the 
inverse cannot be continuous by Banach's closed graph theorem. 

We see that the characterization of a convolution operator is strongly related to the Fourier 
operators: 



We denote by C^(X*) the space of continuous functions X* — > C which have zero limit at the 
infinity. Here X* is the dual space of X, and for any y G X* and x G X the number (y\x) 
means the value of the covector y on the vector x. 

The Fourier operators have the following basic properties (|Ij): 

1. C^(X*) is a Banach space with the maximum norm, F± is continuous and ||i*±|| = 1. 

2. The Fourier operators are one-to-one. Thus, the inverse Fourier operators F± x exist. 

3. The range of F± is dense in C^(X*), however it is not the whole space. Thus, again by 
Banach's closed graph theorem, we infer that the operator F±* is not continuous. 

4. If f,g G L}(X), then F±(f * g) = F±(f) ■ F±(g) (convolution theorem). 

The naive deconvolution procedure then goes in the following way: 

1. take the Fourier transform of the convolution, F±(r)* /), 

2. divide the above function by F±r], 

3. calculate the inverse Fourier transform; 



The listed properties of the convolution operator, however, make it practically impossible 
to apply the deconvolution procedure in signal processing. The reason is that the measured 
density function (which is approximated by a normalized histogram in general) is not in the 
range of the convolution operator: it can be considered as the sum of a pdf in the range of the 
operator, plus a noise (e.g. Poissonian noise, originating from the statistical fluctuations of the 
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entries in the histogram bins) outside the range of the operator in general. When applying the 
deconvolution procedure, the inverse operator can be calculated on the first term, however the 
deconvolution would give a nonsense result on the noise term, as it is not in the range of the 
convolution operator, thus leading to a nonsense result on the whole. Various noise suppression 
methods (high frequency cutoffs) are introduced as symptomatic treatment of this problem, 
however these solutions are based on rather intuitive approaches not on sound mathematics, and 
are highly non-unique (thus the derived solutions depend on the noise suppression approach). 
This is because the non-continuity of the inverse of the convolution operator: a small change 
caused by the high frequency regulation in the Fourier spectrum is not guaranteed to stay 
small after the deconvolution. This effect, in general, is referred to as: the deconvolution 
problem (or unfolding problem) is ill posed, i.e. one cannot get a robust method to do the 
deconvolution (or unfolding). Furthermore, if the Fourier transform of the convolver pdf has 
zeros in the finite, then the naive deconvolution becomes even more ambiguous: one has to 
introduce regulation procedures even at certain finite frequencies (at the zeros of the Fourier 
transform of the convolver pdf) . 

Despite of the above difficulties, we developed a robust perturbative method, which solves 
the problem. Our method of series expansion gives a robust and stable method for deconvolu- 
tion. Using this method, the problem of zeros of the Fourier transform of the convolver pdf in 
the finite does not arise at all, furthermore one does not have to reconsider any high frequency 
regulations on a case-by-case intuitive basis. Plus, our series expansion is optimal in the sense 
that it recovers the maximum possible information about the initial pdf even in the case when 
the convolution in question is not even invertible. 



2 Inverse operator by a series expansion 

There exists a basic theorem providing a perturbative method to obtain the inverse of continuous 
linear operators on a Banach space which are not too far from the identity operator. That 
theorem in its original form, however, does not apply to the case of convolution (or folding) 
operators. The main result of this paper is a generalization of that theorem to the case of 
convolution operators. 

Now we recall the series expansion (called also Neumann series) for the inverse of an oper- 
ator. 

Let A be a continuous linear operator on a Banach space such that \\I — A\\ < 1, where / is 
the identity operator. Then the operator A is one-to-one and onto and its inverse is continuous, 
and the series N \— > J2n=o(^ ~ A) n is absolutely convergent to A~ x . 

The proof is pretty simple, and can be found in any textbooks of functional analysis (e.g. [H], 
[9]). It will be instructive, however, to cite the proof, as later we will strengthen this theorem. 

First, it is easily shown by induction that ^^(J-AjM = A '£n=o( I - A ) n = I-(I-A) N+1 . 
The condition \\I — A\\ < 1 guarantees that the sequence N i— > (I — A) N+1 converges to zero 
in the operator norm, and the absolute convergence of the series N \— > J2n=o(I ~~ A) n , thus 
(En^oC - A) n ) A = A (J2Z (I ~ A) n ) = /, i.e. A' 1 = A) n . As A~ l is expressed as 

a limit of a series of continuous operators which is convergent in the operator norm, we infer 
that A -1 is continuous. 

Remark 5. The conditions of the above series expansion theorem fail for any folding operator 
A p . 
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1. We can observe that the series expansion is only meaningful for the case of a folding 
operator only when the spaces X and Y are the same. 

2. Let us assume that Y = X . Then, it is easily obtained that a folding operator A p does 
not satisfy the required condition \\I — A p \\ < 1. It is trivial by the triangle inequality of 
norms that \\I — A p \\ < 2. We will show now that this inequality can be saturated for a 
wide class of cpdfs. Let us choose an arbitrary point y G X , and consider the series of 
pdfs n I— > \{K (y )) XK n (v)> where K n (y) are compact sets having non-zero Lebesgue measure 
\{K n (y)), such that K n+1 (y) C K n (y) for all n G N and n K n (y) = {y}. Then, 

neN 



(I ~ A p ) 
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By making use of the fact that the integral of any pdf is 1, one can write 
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for the hrst term. For the second term, one can use the monotonity of integration: 
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:X K „ M ( X ) dx dz 



Here, at the second equality J x ^ \ y ^ X Kn ( y )( z ) dz = 1 was used, and the fact that the 
integral of any pdf over a Borel set is smaller or equal to 1 was used at the third equality. 
Thus, we infer the inequality: 



(I ~ A p ) 



1 
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If the point (y,y) G X x X is a Lebesgue point of p, then we will shown that the 
integral term goes to zero when n goes to infinity, thus saturating our inequality in 
question. If a function g : X — > C is locally integrable, then a point y G X is 
called a Lebesgue point of g if Km HK l n{y)) J Kn(y) \g(x) - g(y)\ dx = 0. If y G X 
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is a Lebesgue point for g, then by the monotonity of integration it also follows that 
lim X ( K l ^ f K g(x) dx = g(y). Applying this result for p on the product space X x X 

(assuming that the point (y,y) G X x X is a Lebesgue point of p), we have that the 
sequence n i-> x{K \ (y)) A(j ^ (y)) J Kn(y) J Kn(y) p(z\x) dx dz is convergent to p(y\y). Multiply- 
ing this sequence by the sequence n \— > X(K n (y)) (which is convergent to zero), we infer 
that lim X {K {y)) Ik (y) Ik (y) P( z \ x ) ^ x = ®- ^ P JS continuous, then every point in 
X x X is a Lebesgue point of p. Thus, we have shown that if the cpdf p is continuous, 
then || J — A p \\ =2 holds, therefore the original theorem of Neumann cannot be applied 
directly for a folding operator with continuous cpdf. 

Apart from the above remark, the reason is obvious for the obstruction of inverting the 
convolution on the operator level: as the convolution operators are not onto in general, one 
only can try to invert the operator on a function in the range of the operator. We try to 
modify the theorem for the case of convolution operators requiring, instead of convergence 
in the operator series, the convergence of the series N J2n=o(^ ~ A) n (Af) in some sense 
(equivalently, the convergence of the sequence N i— > (/ — A) N+1 f in the same sense), for any 

feL\x). 

For getting a convenient result, let us recall that the elements of L 1 (X) can be viewed as 
regular tempered distributions. The Fourier transformations can be extended to the space of 
tempered distributions, where they are one-to-one and onto, continuous, and their inverse is 
also continuous ([8J, |9J). The proof of convergence will be performed on the Fourier transforms 
of the functions, then the result will be brought back by using the continuity of the inverse 
Fourier transformation on the space of tempered distributions. 

Theorem 6. Let A v be a convolution operator for some r/ 6 L 1 (X). Let Z be the set of zeros 
of the function F±rj. If the inequality 

\l-F ± r]\ < 1 

is satisfied everywhere outside Z, then for all f G L 1 ^) the series 

N 



n=0 

is convergent in the space of tempered distributions, and 

oo 

£(/ - A v ) n (A v f) = f - F ± \ Xz F ± f). 

n=0 □ 

Proof Assume that |1 — F±r)\ < 1 holds everywhere outside Z. Let V denote the subset of 
X* where F±r] is nonzero. It is clear that V and Z are disjoint Lebesgue measurable sets 
and X* = V U Z. Trivially, the sequence N i— > |1 — F±r]\ N+l converges pointwise to on V, 
furthermore |1 — F±r]\ N+l = 1 on Z for all N. For every / G L l (X) and rapidly decreasing test 
function ip on X* , we have 

/(I - F ±V {y)) N+1 F ± f{y) ■ p(y) dy - J Xz ■ F ± f(y) ■ <p(y) dy 
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/ (1 - F ± r,(y)) N+1 F ± f(y) • v (y) dy 
Jv 

[ \l-F ±V (y)\ N+1 \F ± f(y)\-\ V (y)\ dy. 
Jv 



The series of Lebesgue integrable functions N \— > |1 — F±r/\ N+1 \ F±f\ ■ \ip\ converges point- 
wise to zero on V, and |1 - F ± n\ N+1 \F±f\ ■ \cp\ < |1 - F±rj\ 1 \F±f\ ■ \ip\ for all N, thus by 
Lebesgue's theorem of dominated convergence the last term of the inequality tends to zero 
when N goes to infinity. Therefore, the function series N i— > (1 — F±r]) N+1 (F±f) is convergent 
in the space of tempered distributions to the function x z F±j . Applying the inverse Fourier 
transformation F^ 1 and using the continuity of the inverse Fourier transformation in the space 
of tempered distributions, we get the desired result, as by the convolution theorem we have 
F± 1 ((1 - F ± rj) N+1 (F ± f)) = (I- A v ) N+1 f, and because 



N 



f-J2( I - A vT(A v f) = (I-A v ) N ^f 



n=0 



for all N. M 

Remark 7. Let us assume that the condition of the theorem holds. Then it is quite evident 
that 

1. If Z has zero Lebesgue measure (which holds if and only if A v is one-to-one), then 
F± l (.X z F±f) = 0- This means that the series in question always restores the arbitrarily 
chosen original function f if and only if A v is one-to-one, i.e. if and only if F±r) is ae 
nowhere zero. 

2. If Z has nonzero Lebesgue measure, our series also converges, and restores the maximum 
possible information about the original function f, namely the tempered distribution 
f — F± 1 (x z F±f)- However, this tempered distribution may not be a function in general. 
If the function x z F±f is not a continuous function which tends to zero at the infinity, 
then F^ 1 (x z F±f) cannot be an integrable function. As we shall see in the next section, 
if the function x z F±f JS n °t a continuous function which is bounded, then F± 1 (x z F±f) 
cannot even be a measure with finite variation. 

3. Let now rj and f be pdfs, and suppose that F± 1 (x z F±f) — 0. Then our convergence 
result has the following meaning in probability theory: the series converges in the sense 
that the expectation values of all rapidly decreasing test functions on X are restored. 
Namely, for any rapidly decreasing test function ip on X we have that: 

i™o / " A ^ A ^ ^ • ^ dX = j • ^ 

It can be easily observed that the condition of our previous theorem is not always satisfied 
for a pdf rj. E.g. if 77 is a Gaussian pdf centered to zero, then it is satisfied, but e.g. if rj is 
a uniform pdf on a rectangular domain centered to zero, then the condition is not satisfied. 
Therefore, one could think that the applicability of our deconvolution theorem is rather limited. 
This is not the case, however, as stated in our following theorem. 
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Theorem 8. Let r) be a pdf on X . Then for any f G L 1 (X) the series 



N 



N - A Pv A v ) n A Pri (A v f) 



n=0 



is convergent in the space of tempered distributions, and 



oo 



5> - Ap.AXAp^AJ) = f- F±\ Xz F±f) 



n=0 



where Z := {y G X*\F±r](y) = 0}. Here P is the parity operator on L 1 (X), namely Pf(x) := 
f(-x) for all f G L 1 (X) and x G X . n 

Proof Let us observe, that if F±r] is real valued and nonnegative for a pdf rj, then |1 — F±r}\ < 1 
is automatically satisfied outside Z. This is because 

1. by our assumption < F±r] outside Z, thus we conclude that 1 — F±r] < 1 outside Z, and 

2. by the inequality \F±r]\ < \\r)\\ = 1, we conclude that < 1 — \F±r)\ = 1 — F±rj. 

It is easy to see that F±Pi] = F±r) (where the bar denotes complex conjugation) for a pdf 77, 
because rj is real valued. Thus, we have that F±(Pr)*r}) = \F±r]\ 2 is real valued and nonnegative, 
consequently, by our previous observation, the inequality |1 — F±(Pr)*rj)\ < 1 holds outside 
Z, i.e. our previous theorem can be applied by replacing the convolution operator A v with the 
double convolution operator Ap v A v . ■ 

When applying this theorem in practice, one should take into account that the measured pdf 
(which is obtained by histograming in general) is not in the range of the convolution operator, 
but it can be viewed as the sum of a pdf in the range of the convolution operator (if our model 
is accurate enough) and a noise term. By the above theorem, the series expansion will be 
convergent on the pdf in the range of the convolution operator, but will be divergent (most 
probably) on the noise term, as it is not in the range of the convolution operator (in general). 
Thus, the problem is that when to stop the series expansion: one should let the series go far 
enough to restore the original (unknown) pdf, but should stop the series expansion early enough 
to prevent the divergence arising from the noise term. This truncation procedure can be viewed 
as a very elegant way to do the high frequency regulation. Note, however, that the regulation 
problem at the finite frequencies (at the zeros of the Fourier transform of the convolver pdf) 
does not arise at all, with this method. 

The only remaining question is: at which index should one stop to keep the noise content 
lower than a given threshold. 

When working in practice, our density functions are discrete in general (e.g. histograms), 
thus we may view them as a vector of random variables (e.g. in the case of histograming, 
these random variables are the number of entries in the histogram bins). Let us denote it 
by v. If A is a linear operator (i.e. a matrix here), then we have that E(Av) = AE(v) and 
Covar(Af) = ACoven(v)A + , where we denote expectation value by E(-), covariance matrix by 
Covar (•), and the adjoint matrix by (-) + . Thus, in the iV-th step of the series expansion, we 
have 
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This means that if we have an initial estimate for the covariance matrix Covar(f), we can 
calculate the covariance matrix at each step, thus can calculate the propagated errors at each 
order. 

When using the method of histograming, as the entries in the histogram bins are known 
to obey independent Poisson distributions, the initial undistorted estimates E(vj) « jVj (i G 
{1, . . . , M}) and Covar(v) ps diag(iVi, . . . , N M ) will be valid, where we consider our histogram 
to be a mapping H : {1,...,M} — > No,i h- > iVj. The squared standard deviations are the 
diagonal elements of the covariance matrix, thus we can have an estimate on the L 1 -norm of 

the noise term at each iV-th order by taking — \j Covarjj (^2^=0 ~~ • By 

stopping the series expansion when this noise content exceeds a certain predefined threshold, 
we get the desired truncation of the series expansion. 

Remark 9. We show an other (iterative) form of our series expansion which may be more 
intuitive for physicists. Namely, take the initial conditions 

fo '■= A Pv H, 

Co := A Prj dmg(H), C := (A Prt C^ + . 
Then, perform the iteration steps 

/jv+i := fN + fo — Ap v A v f N , 

Cn+i '■= Cn + Co — Ap v A v Cn, Cn+i '■— (c N + Cq — Ap^A^C^j . 

Here H means the initial (measured) histogram, fa is the deconvolved histogram at the N-th 
step, and A Prj A n is the discrete version of the double convolution operator. The quantity Cn is 
a supplementary quantity, and Cn is the covariance matrix at each step. The noise content can 
be written as M * ^2fLi \f (Cn)u, which should be kept under a certain predehned threshold. 

Remark 10. As pointed out in the previous remark, one can exactly follow the error propaga- 
tion during the iteration. However, to store and to process the whole covariance matrix can cost 
a lot of memory and CPU-time. Therefore, one may rely on a slightly more pessimistic but less 
costly approximation of the error propagation, namely on the Gaussian error propagation. This 
means, that at each step one assumes the covariance matrix to be approximately diagonal, i.e. 
this method is based on the neglection of correlation of entries (which, indeed, holds initially), 
that slightly will overestimate the error content. Gaussian error propagation means that when 
calculating the action of the operators in questions, we apply the following two rules: 

1. ifv is a random variable (histogram entry), and a is a number, then a (a ■ v) := \a\ ■ a(v) 
(this is exact, of course), and 

2. if Vi and v 2 are random variables (histogram entries), then cr 2 iyi + v 2 ) '■— cr 2 (^i) + <? 2 (v2) 
(which is exact only if V\ and v 2 are uncorrelated) . Here a means standard deviation. 

Remark 11. Even if the convergence condition for the deconvolution by series expansion is 
satished for A v , it is better to use the double deconvolution procedure by Ap v A v , for the follow- 
ing reason. In practice the measured pdf corresponds to a pdf in the range of A v plus a noise 
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term. When convolving the measured pdf by Prj before the iteration, the noise level is reduced 
by orders of magnitudes (the convolution by Prj smooths out the statistical fluctuations). As 
a thumb rule, one iteration step is lost with the convolution by Prj, but several iteration steps 
are gained, as we start the iteration from a much lower noise level. 



For the case of general unfolding problems, a series expansion will become even more inter- 
esting, as there are no known alternative methods like the naive deconvolution in the case of 
deconvolution problems. 

Unfortunately, for the general case of unfolding, we cannot state such a strong result as for 
the case of deconvolution. This is because our theorem on the deconvolution strongly relies 
on the relation of convolutions and Fourier transformation. However, we can state a sufficient 
condition for the convergence of a series expansion for the general case of unfolding. To state 
this theorem, we have to perform studies not only on pdfs, but also on probability measures. 
The spaces X and Y are going to denote finite dimensional vector spaces again. 

A complex measure P on X is a complex valued a-additive set function defined on on the 
Borel a-algebra of X. The variation of the complex measure P is the nonnegative measure 
\P\ defined as follows: if E is a Borel set, then \P\(E) is the supremum of J2k=i 1-^(^)1 for 
all splitting (Ex, ■ ■ ■ , E n ) of E, i.e. for all such (Ex, ■ ■ ■ , E n ) finite system of disjoint Borel sets 
whose union totals up to E (|8J, |6J). The measures with finite variation (i.e. the complex 
measures P for which |P|(X) < oo) form a Banach space with the norm being the value of the 
variation on X, i.e. ||P|| := |P|(X). Let us denote this space by M(X). 

Recall that a probability measure P on a X is a nonnegative measure on the Borel a-algebra 
of X, with P(X) = 1. Thus, a probability measure is evidently in M(X). 

Definition 12. We shall call a mapping Q : X — > M(Y),x h- > Q(-\x) a folding measure if for 
every i6l the measure Q(-\x) is a probability measure on Y , and for every Borel set E inY 
the function x i— > Q(E\x) is measurable. 

Note, that Q may be viewed as a conditional probability measure on the product space 
Y x X. Evidently, if p is a cpdf, then Q p (E\x) := f E p(y\x)dy defines a folding measure. 

Definition 13. Let Q be a folding measure Q. Then the linear map 



will be called the folding operator by Q. 
Remark 14. The following remarks are trivial. 

1. Such an operator is well de&ned, as for all points x G X and Borel sets E the inequality 
Q(E\x) < 1 holds, thus the function x i— »■ Q(E\x) is integrable by any measure with hnite 
variation. 

2. By the monotonicity of integration, such an operator is continuous and \\Aq\\ = 1, just 
as in the L 1 case. 



3 The general case of unfolding 





10 



A Robust Iterative Unfolding Method for Signal Processing 



3. The folding operator defined above can be viewed as a generalization of the folding op- 
erator A p : L 1 (X) — > L 1 (Y) dehned by a cpdf p. This is because L 1 (X) can naturally 
be embedded into M(X) by assigning to each f G L 1 (X) the measure E \— > Pf(E) := 
Je f( x ) Of course, if the folding measure Q p is dehned by a cpdf p, then the restriction 
of Aq p to L 1 (X) is just A p as dehned before. 

First, we generalize our deconvolution results to the space of measures with finite variation. 
Remark 15. The convolution of two measures F,G G M(X) can be dehned by 

F*G: E^ J F(E - x) dG(x), 

where E runs over all the Borel sets. (Of course, Pf*P g = Pf* g for any /, <? G L 1 (X).) 

The Fourier transformations can also be dehned on M(X), and have the same properties 
as in the L 1 case, except that the Riemann-Lebesgue lemma does not hold (i.e. the Fourier 
transform of a measure is a bounded continuous function but does not tend to zero at the 
infinity). Therefore, our previous results on the series expansion for the deconvolution (Theorem 
8) can directly be generalized to the probability measures, as the elements of M(X) can also 
be viewed as tempered distributions. 

As we remarked above for the deconvolution case, we have a powerful result also in the more 
general framework of measures with finite variation. However, we are still lacking an answer 
for the general cases of unfolding. 

Remark 16. The conditions of the original Neumann series expansion theorem fail also in the 
case of measures. 

1. We can observe that our series expansion is only meaningful for the case of a folding 
operator only when the spaces X and Y are the same. (Just as in the L 1 case.) 

2. Let us assume that Y = X. Then, it is easily obtained that a folding operator Aq does 
not satisfy the required condition \\I — Aq\\ < 1, in general. It is trivial by the triangle 
inequality of norms that \\I — Aq\\ < 2. We will show now that this inequality can be 
saturated for a wide class of folding measures. Let K n (y) (n G N) be a sequence of 
compact sets with nonzero Lebesgue measure, such that K n+1 (y) C K n (y) for each n G N 
and n K n (y) = {y}. Let us denote the complement of a set K n (y) by K^(y). Clearly, 

by considering the splitting (K n (y), K^(y)) of the Borel set X, one has: 

\(I - A Q )Sy\(X) 

> 5 y (K n (y)) - Q(K n {y)\y)\ + \8 v {K*{y)) - Q(K C n (y)\y) 

= \l-Q(K n (y)\y)\+Q(K C n (y)\y). 

At the equality, 5 y (K n (y)) = 1 and S y (K^(y)) = was used. Let us take the limit 
n — > oo on the right side. By the monotone continuity of measures, we have that 
lim Q(K n (y)\y) = Q({y}\y) and lim Q(K*(y)\y) = Q(X\{y}\y), furthermore by 
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the subtractivity of measures we have Q(X \ {y}\y) = Q(X\y) — Q({y}\y). As Q(-\y) is 
a probability measure, we also have Q(X\y) = 1. Thus, 

|(7 - A Q )S y \(X) > |1 - Q({y}\y)\ + (1 - Q({y}\y)). 

As the measure Q(-\y) cannot take up larger values then 1 on any Borel set, we conclude 
that 

\\I-Aq\\ >2-(l-Q({y}|y)). 

Thus, if there exists such a point y G X , where Q({y}\y) = 0, then \\I — Aq\\ = 2. When 
the folding measure Q p is dehned by a cpdf p, then Q p ({y}\y) = always holds (this 
is because a measure of the form Pf - for any function f G L 1 (X) - cannot have sharp 
points, i.e. such points where Pf({y}) ^ 0). Thus, \\l — Aq p \\ = 2 holds for any cpdf p, 
therefore the Neumann series cannot converge for Aq p in the M(X) operator norm. (But 
of course, even Q({y}\y) < \ is enough to violate \\I — Aq\\ < 1.) 

Just like in the convolution case, our strategy will be to require much weaker notions of 
convergence. By intuition, one would think that if for all x G X the Dirac-measures S x are 
restored by the method (in some sense of convergence), then this would be enough for the 
restoration of any other arbitrary measures with finite variation. We provide a similar result 
with slightly stronger conditions. The theorem below is a trivial consequence of Lebesgue's 
theorem of dominated convergence. 

Theorem 17. Let Aq be a folding operator for some folding measure Q. Let us fix a Borel set 
E in X . If for all x G X the sequence 

N^((I-A Q ) N+1 5 X ) (E) 

converges to zero, furthermore 

< oo 



sup sup 



((/ - A Q ) N+1 S x ) (E) 
holds, then for any P G M(X) the series 

is convergent and 

(Ea-^) n V] (E) = P(E). 

\n=0 J □ 

Proof First, we note that for any index iV the measurable function x h- > | ((/ — Aq) n+1 5x) (E) | 
can be bounded by 2 Ar+1 , thus these functions are integrable by any measure with finite varia- 
tion. 

We know that for all x G X the relation lim ( (I — Aq) N+1 5 x ) (E) = holds, further- 

more sup sup ((I - A Q ) N+l 8 X ) (E) < oo. The integral / ((/ - A Q ) N+1 5 X ) (E) dP(x) exists 

for all and the integrands converge pointwise to zero as N tends to infinity. As the in- 
tegrands are dominated by a constant independent of which is clearly |F|-integrable, by 
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Lebesgue's theorem of dominated convergence, the limit and the integration can be inter- 
changed: Jim / ((/ - A Q ) N+1 8 X ) (E) dP(x) = J Jim ((/ - A Q ) N+1 5 X ) (E) dP(x) = 0. On 

the left-hand side, (/ — Aq) can be interchanged with the integration, because / is the identity 
operator and because Aq itself is an integral: we can interchange the integrals by Fubini's 
theorem, namely / (A%6 X ) (E) dP(x) = //.../ Q(E\y N ) dQ(y N \y N ^) . . . dQ( Vl \x) dP(x) = 
{A^P) (E), for arbitrary power N. Thus, lim ((/ - A Q ) N+1 P) (E) = 0. 



Af- 
Using the equality [p - £" =0 (/ - A qY A qP) ( e ) = {i 1 ~ a q) N+1 P) (E), we get the de- 
sired result. ■ 

Remark 18. Assume that the condition of our theorem holds. 

1. The condition sup sup ( (I — Aq) n+1 S x ) (E) < oo (i.e. the condition of boundedness) 

is crucial for the proof in order to be able to interchange the limit and the integration. 
In other words: the restoration of the Dirac-measures 5 X for all x G X is not enough. 

2. If P is a probability measure, then the meaning of our convergence result is that the 
probability of the event (Borel set) E is restored: 

N 



v n=0 



The present theorem is weaker than the one for deconvolution, nevertheless it provides 
a computer-testable condition of convergence for any unfolding problem (which may not be 
expressed as convolution). In the next section, we shall provide some physical examples which 
show the method in operation. Of course, the iteration procedure goes just the same as discussed 
at the end of the previous section. 

Remark 19. If we are testing the convergence criterion by computer, some measure theory 
trivialities are useful. Namely, if the condition holds for disjoint sets, then it also holds for 
the union of them. Thus, in practice (e.g. when handling histograms) , it is enough to confirm 
the condition when the Borel sets E are the histogram bins, because then the condition will 
automatically hold for any set built up from the histogram bins. Of course, we cannot go below 
the granulation of our histogram binning, but if our granulation is hne enough, the numerical 
test of convergence condition can give an accurate answer. 

The disadvantage of our presented convergence criterion is that it is rather expensive even 
for a simple 1-dimensional case (however, for a given folding measure Q, this condition has to 
be shown only once). It may be better to only show the convergence for the given unfolding 
problem, i.e. on a case-by-case basis, and not for the general case of every P G M(X). (The 
disadvantage of such a convergence condition is that surely it will be violated after a certain 
iteration step, because of the divergence arising from the noise term.) Such a condition of 
convergence may be obtained by Cauchy's root criterion: 

Theorem 20. Let Aq be a folding operator for some folding measure Q. Let us fix a measure 
P G M(X) and a Borel set E in X . If the inequality 



hmsup n JW-Aq)"A q P) (E)\ < 1 

N v 



13 



A Robust Iterative Unfolding Method for Signal Processing 



holds, then the series 

N -> ( f> " a qT a q p ) ( E ) 

\n=0 J 

is absolute convergent. □ 

With the above condition one may control the convergence of the series iteration for a given 
measured pdf: the condition limsup sup — Aq) n AqP) (E)\ < 1 may be required as a 

N E 

condition of convergence, where the Borel sets E are the histogram bins. Given the order N, 
we shall call the number sup - A Q ) N A Q P) {E)\ the Cauchy index. 

E 

Remark 21. The iteration scheme is the same as discussed at the end of the previous section 
(Remark 9). In the iteration scheme, the convolution operator A Prj should be replaced by some 
folding operator A G (used to artificially smear the measured histogram in order to reduce the 
noise content, as pointed out in Remark 11 - typically this may be chosen to be a convolution 
operator by a Gauss pdf centered to zero, or can be chosen to be the identity operator, if 
smoothing is not needed), and the convolution operator A v should be replaced by the folding 
operator Aq (describing the physical smearing process). 



4 Examples and applications in physics 

Our first test example will be a deconvolution problem of an initial Cauchy pdf of the form 
x l— *• \ ' r 2 +x 2 ) and with a Gauss convolver pdf of the form x i— > y=p • exp ^— over the 
real numbers. We will choose T = 1 and o = 1 in our example. By Theorem 8 we can assure 
the convergence of the problem. The result is shown in Figure^ 

Our second test example will be a deconvolution problem of an initial Cauchy pdf as in the 
previous example with a triangle convolver pdf of the form iHp - X\- W ,w\ i x ) '\W — \x\ \ over 
the real numbers. We will choose W = 2 in our example. By Theorem 8 we can also assure 
the convergence of the problem. The result is shown in Figured 

A signal smearing, caused by a measurement apparatus, is described by folding in general. 
In this case the cpdf in the folding integral is the response function of the device. Our series 
unfolding can be applied to remove the non-ideal detector smearing at the spectrum level. This 
is a common issue in analysis of recorded data in experimental physics, which may be solved 
by our method. 

Our physical example will be the n decay. 7T°-s are produced in high-energy particle col- 
lisions (e.g. in hadron or heavy-ion collisions). The particle 7r° decays through the channel 
7T° — > 7 + 7 decay (98.798% branching ratio). It has such a short lifetime (8.4 ■ 1CT 17 sec), that 
even in the highest energy colliders it only travels at most micrometers before decay, thus from 
the detector's point of view, the resulting 7 photons come from the collision point. The irO 
particles are detected via the resulting 7 photon pairs. This is possible because the dominant 
part of the 7 yield comes from n° decays in hadron or heavy-ion collisions. The 7 candidate 
signals are paired to each other in every possible combination, and the mass of each pair is 
calculated from the hypothesis that they originate from a common ir° decay. The combinato- 
rial background is estimated by so called event mixing techniques (by taking 7 candidates from 
different events, thus these signals are completely independent). The tv° yield as a function of 
momentum thus can be obtained, which plays an important role in high-energy particle physics. 
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Figure 1: A Gauss*Cauchy deconvolution by series expansion 
Order=63, error content=10% 
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Figure 2: A triangle*Cauchy deconvolution by series expansion 
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However, in certain cases (e.g. in heavy-ion collisions) the reconstruction efficiency of 7r°-s 
can be very low at certain momentum space regions, thus this straightforward reconstruction 
method is not always applicable for measuring the momentum distribution of the produced 

7T°-S. 

A possible idea is to measure the single 7 momentum distribution, and reconstruct the parent 
7T° momentum distribution from it, somehow. The arising of the child 7 photon momentum 
pdf from a parent 7r° momentum pdf is described by a folding, as will be discussed below. The 
task is: to unfold the original n° momentum pdf from the 7 momentum pdf. This issue was 
also addressed in however the answer given by the paper was not fully satisfactory. Firstly, 
the method described in the paper was very specific to the particular case of 7r° — > 7 + 7 decay 
(and did not deal with the general problem of unfolding). Secondly, two kinematical kind of 
approximations were used which are mathematically ill-defined and have an unclear physical 
meaning. It seems, indeed, that our method gives a more realistic answer, as it will be shown. 

Let us denote the momentum space by (M, g), where M is a 4-dimensional real vector space, 
and g : M x M ^ R is a Lorentz form (with signature 1,-1,— 1,-1). Let us choose a time 
orientation on it. Let ^ + (0) denote the positive null cone (positive light cone), and let V + (m) 
be the positive mass shell with mass value m (m will now play the role of n° mass). The 7r° 
momentum pdf is defined over V + (m), and the 7 photon momentum pdf is defined over V^O). 
However, they also can be be viewed as probability measures over M, with their support in 
V + {m) and F + (0), respectively. Given a n° momentum, the 7 momenta directions (decay axes) 
are uniformly distributed in the n° rest frame (this is the physical information put in). Namely, 
let us take the set 



F := < (p, k) G M x 



p G V + (m), k G V + (0), g ( p, k 

V9{P,P) 



and let us define for every p G M the set F p := {k G M | (p, k) G F}. Clearly, F p is the set of 
possible 7 photon momenta arising from a ir° with momentum p (in other words: F p is defined 
by the vectors in V + (0) which have energy y in the rest frame of the 7T° with momentum p). We 
shall define our folding measure by: Q(-\p) is the measure over M for each p which describes the 
uniform distribution on F p (as F p is compact, it has finite measure, thus this is meaningful). 
If P is a probability measure over M describing the 7r° momentum distribution, then the 7 
photon momentum distribution is defined by the probability measure AqP. Thus, one may 
try to obtain the parent n° momentum distribution by unfolding the measured 7 momentum 
distribution. This will be done explicitly below for a toy example. 

Let us parameterize the momentum space with respect to an Einstein synchronized frame 
(eo, e±, e-ii 63) that corresponds to the center-of-mass system of the collision. We choose the 
collision axis (the beam axis) to be the third spatial coordinate axis which we also call the 
longitudinal direction. As the experimental setups of collisions are axially symmetric with 
respect to this axis, the single particle momentum distributions are axially symmetric with re- 
spect to the longitudinal direction. Therefore, it is convenient to parameterize a n° momentum 

p G V + {m) in the form (g(es,p), \/ g{ei,p) 2 + <?(e2,p) 2 , arctan f ^ip) ) ) ' an ^ a T momen tum 

k G V^ + (0) in the form (g(e 3 ,k), \Jg(ei, k) 2 + g(e 2 , k) 2 , arctan f • The three coordi- 

nates are called longitudinal momentum, transverse momentum and azimuth, respectively. The 
axial symmetry means that the pdfs describing 7r° and 7 momentum distributions only depend 
on the longitudinal and transverse momentum. 
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It is even more convenient to introduce a more sophisticated parameterization: if p L is the 

longitudinal momentum and p„ is the transverse momentum, then y := asinh ( , P L = I (lon- 

1 V v m P T J 

gitudinal rapidity) and E T := ^Jm 2 + p| (transverse energy) can be introduced. The so called 
longitudinal pseudorapidity r\ := asinh (jpj is also useful for longitudinal parameterization. 
We shall present the pdfs in the (rj, p T ) parameterization. 

For demonstration, we take a realistic toy example of ir° momentum pdf. The n° momentum 
pdf is characterized by: the momentum pdf of the it with respect to the Lorentz invariant 
measure of the mass shell V + {m) corresponds to a product of a Gaussian one in y and an 
exponential one in E T (a typical experimental spectrum can be qualitatively described in this 
a way). The standard deviation of the y distribution was taken to be 0.5, and the inverse slope 
parameter of the E T distribution was taken to be 0.5GeV. 

The initial n° momentum pdf is presented in Figure\^togethev with the arising 7 momentum 
pdf. We used a sample of 10000000 Monte Carlo 7r° particles to generate the measured 7 
spectrum. 

The unfolded n° momentum pdf is presented in Figure ^] together with the initial n° mo- 
mentum pdf. Due to the high statistics, we did not apply smearing for noise reduction (as 
discussed in Remark 21). 

To demonstrate the capability of the method, we also included a smearing according to 
the CMS-ECAL detector's known energy and angular resolution function, when generating the 
measured gamma responses: the method also removes this detector effect from the momentum 
pdf. This fact is rather important in practice, because a non-ideal detector resolution changes 
the inverse slope parameter of the transverse momentum spectrum remarkably, which is used 
in heavy-ion physics to determine the temperature of the collided system. 

Some sections of the previous pdfs are also presented at rj = constant slices in Figured and 
in Figured 

For completeness, we also show the answer given by R. Cahn's prescription (as described in 
|3j), in Figure^ and Figured Of course, here we did not include additional detector effects as 
in our unfolding case, as R. Cahn's method was not designed to undo detector effects. As one 
can see, the reconstructed 7r° momentum pdf given by R. Cahn's prescription is rather far from 
the initial one, especially when compared to the answer given by our series expansion method, 
introduced in this paper. 

Our remaining issue is to show the convergence of our series expansion for this 7r° — > 7 + 7 
decay unfolding problem. In Figure\J^we plotted the Cauchy index as a function of the iteration 
order. It is clearly seen that the Cauchy indices are saturating to ~ 0.8, thus the convergence 
is a consequence of Theorem 20. 

Remark 22. It is very important to note that when implementing the folding operator, one 
does not have to know the analytic form of the integral. In the it — > 7 + 7 case it is possible 
to calculate the integral formula analytically from kinematics, however, the integral becomes 
very ugly in the (77, p T ) parameterization. Therefore we calculated the action of the folding 
operator by Monte Carlo simulation, which makes the method easy to implement. 
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Figure 3: Input ir° momentum pdf and measured 7 momentum pdf 
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Figure 4: Input 7r° momentum pdf and unfolded ir° momentum pdf 
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5 Concluding remarks 

A robust iterative deconvolution and unfolding method was developed for applications in signal 
processing. The method has three main advantages: 

1. It solves any deconvolution problem optimally. 

2. It also solves a wide class of more general unfolding problems (for which no general 
unbiased method was known previously). 

3. The method is quite easy to implement even for sophisticated folding problems, if Monte 
Carlo integration method is applied. 
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Figure 5: Input ir° momentum pdf, measured 7 momentum pdf, and reconstructed ir° momentum pdf: taken 
at the r\ = 0.0 slice 
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Figure 6: Input ir° momentum pdf, measured 7 momentum pdf, and reconstructed ir° momentum pdf: taken 
at the 77 = 0.4 slice 
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Figure 7: Input tt° momentum pdf, measured 7 momentum pdf, and reconstructed ir° momentum pdf with 
R. Cahn's method: taken at the 77 = 0.0 slice 
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Figure 8: Input tt° momentum pdf, measured 7 momentum pdf, and reconstructed 7r° momentum pdf with 
R. Cahn's method: taken at the r\ = 0.4 slice 
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Figure 9: Cauchy convergence test of the series expansion for the ir° — > 7 + 7 problem 
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